The Effect of Combining Different Semantic Relations on Arabic Text Classification

نویسندگان

  • Suhad A. Yousif
  • Islam Elkabani
  • Rached Zantout
چکیده

A massive amount of documents are being posted online every minute. The task of document classification requires extensive background work on the content of documents, where keyword-based matching alone may not be sufficient. Much research has been carried out in several languages that has revealed significant results. However, Arabic documents still pose a great challenge due to the nature of Arabic language. Extracting roots or stems from the breakdown of multiple Arabic words and phrases are an important task that must be completed before applying text classification. The research at hand proposes an algorithm for classifying Arabic-Text documents using semantic relations between words based on an Arabic thesaurus, mainly synonyms, hyperonyms and hyponyms. The experiments conducted in this study evaluated the results using F1-Measure and compared them to results obtained via other existing methods, such as utilizing stemmers and part-of-speech taggers, where it indicated an increment of more than 12.6% for the novel method using semantic relation over other methods. Arabic-WordNet was utilized as a thesaurus for indicating possible relations to be examined. The obtained results indicate that the domain of the semantic web reveals a variety of options for enhancing text classifications, which are highly competitive with current methods. Future work will include identifying best relations to be utilized among the available 20 relations. KeywordsArabic Text classification; Stemmer; Part of Speech; Conceptual features; Semantic relations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

آشکارسازی و تعیین مکان متون فارسی - عربی در تصاویر ویدیویی

Video text detection plays an important role in applications such as semantic-based video analysis, text information retrieval, archiving and so on. In this paper, we propose a Farsi/Arabic text detection approach. First, with an appropriate edge detector, edges are extracted and then by using edges cross ponts, artificial corners are extracted. Artificial corner histogram analysis is done for ...

متن کامل

Enhancement of Arabic Text Classification Using Semantic Relations of Arabic WordNet

Corresponding Author: Suhad A. Yousif Department of Mathematics and Computer Science, Faculty of Science, Beirut Arab University, Lebanon Email: [email protected] Abstract: Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information (written in Arabic language) on the web. In most text classification processes, featu...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015